feat(cli): add --progress flag for live indexing progress output#108
Open
halindrome wants to merge 17 commits intoDeusData:mainfrom
Open
feat(cli): add --progress flag for live indexing progress output#108halindrome wants to merge 17 commits intoDeusData:mainfrom
halindrome wants to merge 17 commits intoDeusData:mainfrom
Conversation
- Declare cbm_progress_sink_init(FILE*), cbm_progress_sink_fini(), cbm_progress_sink_fn() - cbm_progress_sink_fn matches cbm_log_sink_fn callback signature - Include guard CBM_PROGRESS_SINK_H, includes <stdio.h> only Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…labels - cbm_progress_sink_fn() parses msg= tag from structured log lines - Maps pipeline.discover, pipeline.route, pass.start, pass.timing (9 passes), pipeline.done, parallel.extract.progress to human-readable stderr output - parallel.extract.progress uses \r for in-place terminal updates - Unknown tags pass through to previous sink (MCP UI routing preserved) - cbm_progress_sink_init/fini save and restore previous sink Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…cancel - Scan argv for --progress before tool dispatch; strip it and shift args - Add g_cli_pipeline global and cli_sigint_handler (calls cbm_pipeline_cancel) - When --progress: call cbm_progress_sink_init(stderr) and register SIGINT handler - For index_repository + --progress: bypass cbm_mcp_handle_tool, call cbm_pipeline_new/cbm_pipeline_run directly, set g_cli_pipeline before run - Assemble JSON result (project/status/nodes/edges) via snprintf, print to stdout - After run, call cbm_progress_sink_fini(); all progress output goes to stderr Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CLI_SRCS now includes src/cli/progress_sink.c alongside src/cli/cli.c - Build verified clean: build/c/codebase-memory-mcp produced with no warnings - All 2042 tests pass with no regressions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add test_cli_progress_stderr_labels: injects pipeline.discover log event, asserts progress sink writes "Discovering" to target FILE* - Add test_cli_progress_stdout_json: injects pass.start + pipeline.done events, asserts "[1/9]" phase label and "Done:" appear; confirms output is not JSON - Include <cli/progress_sink.h> and <foundation/log.h> headers - Register both tests in SUITE(cli) under group G Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Normalize alignment whitespace and line-continuation style per project clang-format configuration. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a custom log sink is registered via cbm_log_set_sink(), suppress the default fprintf(stderr, ...) output in both cbm_log() and cbm_log_int(). The sink is now the sole output handler rather than an additive listener. Also pre-scan for --progress in main() before cbm_mem_init() so the sink is installed before mem.init fires, keeping stderr completely clean. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- volatile on g_cli_pipeline so signal handler always observes the pointer - volatile on s_needs_newline to prevent stale-read between worker/main threads - Fix incorrect PIPE_BUF thread-safety comment (correct reason: per-FILE* locking) - Add comment documenting --progress silent-ignore for non-index_repository tools - Rename test_cli_progress_stdout_json → test_cli_progress_phase_labels - Add test_cli_progress_parallel_extract: exercises \r path + pass.timing flush - Add test_cli_progress_unknown_tag: verifies unknown events are silently dropped 2046 tests pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…utput Two display bugs found during manual testing against a large repo: 1. Phases 6 and 7 were swapped: HTTP links fires before git history in the actual pipeline execution order. Swap their phase numbers in the sink. 2. "Done: 0 nodes" was shown because cbm_gbuf_dump_to_sqlite() frees node_by_qn before pipeline.done is logged, making cbm_gbuf_node_count() return 0. Fix: capture node/edge counts from the gbuf.dump event (which fires with the real counts before the hash table is freed) and use them for the Done: display line. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the pre-scan cbm_progress_sink_init(stderr) in main() with a temporary log level raise to WARN around cbm_mem_init(). This suppresses the mem.init log line without installing the sink twice — run_cli() remains the sole owner of the progress sink lifecycle. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two bugs found when using CMM tools on a project opened from a different
working directory:
1. configure_pragmas (store.c): busy_timeout was set AFTER journal_mode=WAL,
so the WAL pragma could block indefinitely waiting for a write lock with
no timeout. Reorder: set busy_timeout=10000 first, then journal_mode=WAL.
2. handle_get_architecture (mcp.c): resolve_store uses SQLITE_OPEN_CREATE so
it always returns a non-NULL store even for unindexed projects, causing
get_architecture to return {total_nodes:0,total_edges:0} silently instead
of an error. Add cbm_store_get_project check after resolve_store and return
a clear "project not indexed" error when the project row is absent.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add static helper verify_project_indexed() before handle_get_graph_schema
- Replace inline guard in handle_get_architecture with helper call
- Apply guard to handle_search_graph, handle_get_graph_schema,
handle_trace_call_path, handle_get_code_snippet, handle_query_graph
- All five query handlers now return {"error":"project not indexed —
run index_repository first"} instead of silently returning empty results
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add cbm_store_open_path_query() that opens with SQLITE_OPEN_READWRITE only (no SQLITE_OPEN_CREATE); returns NULL when file is absent - Declare cbm_store_open_path_query() in store.h - Change resolve_store() in mcp.c to call cbm_store_open_path_query so querying a nonexistent project never creates a ghost .db file - Indexing path (cbm_store_open_path) retains SQLITE_OPEN_CREATE Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Asserts that query handlers return a guard error for unknown projects and that no ghost .db file is created in the cache directory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Update stale comment in verify_project_indexed: resolve_store now uses cbm_store_open_path_query (no SQLITE_OPEN_CREATE), so store is NULL for missing files; the helper catches the empty-but-present .db case - Guard state updates in resolve_store behind successful open check: only set owns_store=true and update current_project when store is non-NULL, preventing misleading state when an unknown project is queried - Expand smoke_guard.sh to test all 5 guarded handlers (search_graph, query_graph, get_graph_schema, trace_call_path, get_code_snippet) instead of search_graph only; each checks both the guard error and no-ghost-file invariant Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix handle_get_code_snippet inline store-NULL check to return the same
JSON error format as REQUIRE_STORE and the other inline checks:
{"error":"no project loaded"} instead of a plain string
- Fix smoke_guard.sh query_graph invocation: pass "query" parameter
(not "cypher") to match what handle_query_graph actually reads; the
wrong key caused the handler to early-return before reaching the guard
- Remove extra blank line in handle_get_graph_schema guard block
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yyjson_mut_obj_add_str stores the string pointer without copying it. The ADR read path allocated buf, added it to the yyjson doc, then immediately freed buf — leaving the doc with a dangling pointer. When yy_doc_to_str serialized the doc, it read freed memory, producing garbage bytes. cbm_jsonrpc_format_response then called yyjson_read on the corrupted JSON, which failed silently, so no "result" field was emitted and the MCP client hung waiting for a valid response. Fix: hoist adr_buf to function scope, initialized to NULL, and free it after yy_doc_to_str has serialized the document. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a `--progress` flag to the CLI that shows live, human-readable progress on stderr while keeping the JSON result on stdout unchanged.
```
$ codebase-memory-mcp cli --progress index_repository '{"repo_path":"/path/to/repo"}'
Discovering files (6402 found)
Starting full index
[1/9] Building file structure
Extracting: 6300/6402 files (98%)
[2/9] Extracting definitions
[3/9] Building registry
[4/9] Resolving calls & edges
[5/9] Detecting tests
[6/9] Scanning HTTP links
[7/9] Analyzing git history
[8/9] Linking config files
[9/9] Writing database
Done: 71645 nodes, 106757 edges (9422 ms)
{"project":"...","status":"indexed","nodes":71645,"edges":106757}
```
Design
No pipeline changes. Implementation registers a custom log sink via `cbm_log_set_sink()` in `run_cli()` that maps existing structured log events (`pass.start`, `pass.timing`, `pipeline.done`, etc.) to human-readable phase labels. When a sink is registered it becomes the sole output handler — the default `fprintf(stderr, ...)` is suppressed, keeping stderr clean.
Key implementation details:
Test plan
🤖 Generated with Claude Code